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ABSTRACT 

In this paper, we analyze the convergence of a distributed 
Robbins-Monro algorithm for both constrained and uncon- 
strained optimization in multi-agent systems. The algorithm 
searches for local minima of a (nonconvex) objective func- 
tion which is supposed to coincide with a sum of local utility 
functions of the agents. The algorithm under study consists 
of two steps: a local stochastic gradient descent at each 
agent and a gossip step that drives the network of agents to 
a consensus. It is proved that i) an agreement is achieved 
between agents on the value of the estimate, ii) the algo- 
rithm converges to the set of Kuhn- Tucker points of the op- 
timization problem. The proof relies on recent results about 
differential inclusions. In the context of unconstrained op- 
timization, intelligible sufficient conditions are provided in 
order to ensure the stability of the algorithm. In the latter 
case, we also provide a central limit theorem which governs 
the asymptotic fluctuations of the estimate. We illustrate 
our results in the case of distributed power allocation for 
ad-hoc wireless networks. 

1. INTRODUCTION 

The Robbins-Monro (R-M) algorithm is a widely used 
procedure for finding the roots of an unknown function. Its 
applications range from Statistics (e.g. [2]), Machine Learn- 
ing (e.g. [3]), Electrical Engineering (e.g. [4]) and Commu- 
nication Networks. Consider the problem of minimizing a 
given differentiable function /. Formally, a R-M algorithm 
for that sake can be summarized as an iterative scheme of 
the form d„+i ^ 9„ + 7„+i(— V/(^„) -I- Cn+i) where the se- 
quence (Sn)nsN will eventually converge to a local minimum 
of /, and where represents a random perturbation. 

In this paper, we investigate a distributed version of the 
R-M algorithm. Distributed algorithms have aroused deep 
interest in the fields of communications, signal processing, 
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control, robotics, computer technology, among others. The 
success of distributed algorithms lies in their scalability but 
are often harder to analyze than their centralized counter- 
parts. We analyze the behavior of a network of agents, repre- 
sented as a graph, where each node/agent runs its own local 
R-M algorithm and then randomly communicates with one 
of its neighbors in the hope of gradually reaching a consen- 
sus over the whole network. One well-established device for 
reaching a consensus in a network is to use gossip algorithms 
[5]. Since the seminal paper of [6], random gossip algorithms 
have been widely studied as they encompass asynchronous 
networks with random switching graph topologies. In [5], 
the Authors introduce an iterative algorithm for the opti- 
mization of an objective function in a parallel setting. The 
method consists in an iterative gradient search combined 
with a gossip step. More recently, this algorithm has been 
studied by [T] in the case where the objective function is 
the aggregate of some local utility functions of the agents, 
assuming that a given agent is only able to evaluate a (noisy 
version of) the gradient/subgradient of it own utility func- 
tion. An alternative performance analysis is proposed by [5] 
in a linear regression perspective. 

In this paper, we consider a network composed by A'^ > 1 
agents. A given continuously differentiable utility function 
fi : R'' — >■ R is associated to each agent i = 1, . . . , A, where 
d is an integer. We investigate the following minimization 
problem: 

i — l 

where G is a subset of R'* supposed to be known by each 
agent. We are interested in two distinct cases: first the case 
of unconstrained minimization (G — R''), second, the case 
where G is a compact convex subset specified by inequality 
constraints. However, we do not suppose that the objective 
function / := fi is convex. Moreover, we consider the 
context of stochastic approximation: each agent observes 
a random sequence of noisy observations of the gradient 
V/i. We are interested in on-line estimates of local solu- 
tions to ([T]) using a distributed R-M algorithm. 

Our contribution is the following. A distributed R-M al- 
gorithm is introduced following [5] [T] |8]. It is proved to 
converge to a consensus with probability one (w.p.l.) that 
is, all agents eventually reach an agreement on their esti- 
mate of the local solution to the minimization problem ([T]). 
In addition, each agent's estimate converges to the set of 



Kuhn- Tucker points Lkt of ([T]) under some assumptions. In 
the unconstrained case, the proof is based on the existence 
of a well-beliaved Lyapunov function which ensures the sta- 
bility of the algorithm. In the constrained case, the proof 
relies on recent results of [TD] about perturbed differential 
inclusions. 

The paper is organized as follows. Section [2] introduces the 
distributed algorithm and the main assumptions on the net- 
work and the observation model. In Section [31 we analyze 
the behavior of the algorithm in case of unconstrained op- 
timization (G — R''). We prove the almost sure agreement 
and the almost sure convergence of the algorithm. We pro- 
vide the speed of convergence as well as a Central Limit 
Theorem on the estimates. In Sectional we investigate the 
case where the domain G is determined by a set of inequal- 
ity constraints. Agreement and almost sure convergence to 
Kuhn- Tucker points is shown. Section [5] provides an exam- 
ple of application to distributed power allocation for ad-hoc 
wireless networks. 

2. THE DISTRIBUTED ALGORITHM 
2.1 Description of the Algorithm 

Each node i generates a stochastic process (S,i,i)n>i in R'' 
using a two-step iterative algorithm: 

[Local step] Node i generates at time n a temporary iter- 
ate 6n,i given by 

e^,^ = PG[e^-l,^+l^Y„.A , (2) 

where 7„ is a deterministic positive step size, Yn^i is a ran- 
dom variable, and Pq represents the projection operator 
onto the set G. In particular, Pq is equal to the identity 
map in case G is taken to be the whole space R''. Random 
variable Yn^i is to be interpreted as a perturbed version of 
the opposite gradient of fi at point As will be made 

clear by Assumption Aid) below, it is convenient to think 
of y„,i as y„,i = — V/i(6'„_i,i) -I- 5M„,i where (5M„,i)„ is a 
martingale increment sequence which stands for the random 
perturbation. 

[Gossip step] Node i is able to observe the values 9nj of 
some other j's and computes the weighted average: 

JV 
J = l 

where Wn ~ [wnli, is a stochastic matrix. 

We cast this algorithm into a more compact vector form. De- 
fine the random vectors On and Y„ as 0„ :— {6^^i, . . . , O^ f^Y" 
and Y„ = {Y„^\, . . . , Fn.jv)"^ where denotes transposition. 
The algorithm reduces to: 



e„ = (w„ ® h)PGN + 7„y„] (3) 

where ® denotes the Kronecker product, Id is the dxd iden- 
tity matrix and Pqn is the projector onto the Nth order 
product set G^ := G x • • ■ x G. 

2.2 Observation and Network Models 

The time- varying communication network between the nodes 
is represented by the sequence of random matrices (Wn)n>i. 



For any n > 1, we introduce the a-field = <j{Oo, Yi-n, Wi-.n)- 
The distribution of the random vector Yn+\ conditionally to 
is assumed to be such that: 

for any measurable set A, where {fie)g^^dN is a given family 
of probability measures on R''^. For any 6 £ R"^^, define 
lEe[g(y)l ~ J g{y)fie{dy). Denote by 1 the iV x 1 vector 
whose components are all equal to one. Denote by \x\ the 
Euclidean norm of any vector x. It is assumed that: 

Assumption 1. The following conditions hold: 

a) Matrix Wn is doubly stochastic: Wnl = W^l = 1. 

b) {Wn)n>i is a sequence of square-integrable matrix-valued 
random variables. The spectral radius pn of matrix ¥,{WnW^) — 
U'^ /N satisfies: 

lim n(l — pn) = +00 . 

n — ^oc 

c) For any positive measurable functions gi,(?2, 

E[gi{Wn + l)g2{Yn+l)\^n]=E[g,{Wn+l)]EgJg2iY)] . 

d) 00 G G^ and E[|0,j|-] < +oo. 

e) For any i — 1, . . . , N , fi is continuously differentiable. 

f) For anye = {eT,--- ,0%)^, 

Ee[y] = -(V/i(eif ,\/fN{9Nff . 

Condition A[l^) is satisfied provided that the nodes coor- 
dinate their weights. Coordination schemes are discussed 
in [71 [6]. Due to A[l]3), note that p„ < 1 as soon as n is 
large enough. Loosely speaking. Assumption A[1]d) ensures 
that E(WnW^) is close enough to the projector on 
the line {tl : t £ R}. This way, the amount of infor- 
mation exchanged in the network remains sufficient in or- 
der to reach a consensus. Condition A[lJ;) implies that r.v. 
Wn+i and Yn+i are independent conditionally to the past. 
In addition, {Wn)n>i forms an independent sequence (not 
necessarily identically distributed). Condition Afltl means 
that each Yn,i can be interpreted as a 'noisy' version of 
—V fi{6n~i.i)- The distribution of the random additive per- 
turbation Yn,i + V fi{9n~i,i) is likely to depend on the past 
through the value of 0n-i, but has a zero mean for any given 
value of On~i- 

Assumption 2. a) The deterministic sequence (7n)n>i is 
positive and such that "^n 7n = oo. 
b) There exists a > 1/2 such that: 

lim n"7„ = (4) 

n — J-oc 

liminfi^^>0. (5) 

n->oo n°"yn 

Note that, when ([4]) holds true then 7^ < oo, which is a 
rather common assumption in the framework of decreasing 
step size stochastic algorithms [11]. In order to have some 
insights on consider the case where 1 — p„ = a/n^ and 
Jn = ^o/rfi for some constants a, 70 > 0. Then, a sufficient 
condition for ([Sj and A[2^) is: 

1/2 < 1/2 . 



In particular, ^ e (1/2, 1] and rj € [0, 1/2). The case 77 = 
typically correspond to the case where matrices Wn are iden- 
tically distributed. In this case, — p is a, constant w.r.t. 
n and our assumptions reduce to: p < 1. However, matri- 
ces Wn are not necessarily supposed to be identically dis- 
tributed. Our results hold in a more general setting. As a 
matter of fact, all results of this paper hold true when matri- 
ces Wn are allowed to converge to the identity matrix (but 
at a moderate speed, slower than l/y/n in any case). There- 
fore, matrix Wn may be taken to be the identity matrix with 
high probability, without any restriction on the results pre- 
sented in this paper. From a communication network point 
of view, this means that the exchange of information be- 
tween agents reduces to zero as n — >■ 00. This remark has 
practical consequences in case of wireless networks, where it 
is often required to limit as much as possible the communi- 
cation overhead. 

3. UNCONSTRAINED OPTIMIZATION 
3.1 Framework and Assumptions 

In this section, G is taken to be the whole space, so that the 
algorithm Q simplifies to: 

0n = {Wn(Sld){On-l+-(nYn) . (6) 

Our aim is to study the convergence of the above iterate 
sequence. Note that sequence 0„ is not a priori supposed 
to stay in a compact set. Additionally, in most situations, 
large values of some components of may lead to large 

values of Yn. Otherwise stated, one of the main issues in the 
unconstrained case is to demonstrate the stability of the al- 
gorithm ((5]) based on explicit and intelligible assumptions on 
the objective function / and on the stochastic perturbation. 



Assumption 3. There exists a function V : R'' — >• M+ 
such that: 

a) V is dijjerentiable and W is a Lipschitz function, 
h) For any 9 G E"^, -Vl/(0)^V/(6>) < 0. 

c) There exists a constant Ci, such that for any 9 € W^, 

\vv{e)\^ < Ci{i + v{9)). 

d) For any M > 0, the level sets {9 eM!^ : V{9) < M} are 
compact. 

e) The set C ~ {9eR'^ : W{9)'^Vf{9) = 0} is bounded. 

f) V{L) has an empty interior. 



9 


dummy variable in K 


e 


dummy variable in R''^ 




estimate at agent i and at time n in R'' 


On 


vector of the A*' agents estimates in R'*'^ 


(On) 


average of the agents estimates in R'' 


J 


projector onto the consensus subspace 




disagreement vector between agents in R''^ 


f 


Aggregate utility function f = J2ifi 


Yn 


vector of all observations at time n, in R'*'^ 


1 


Vector (1, • ■ ■ , 1)"^ in R^ 




step size 


pn 


spectral radius of E(W„Wj) - U'^/N 



Table 1: Summary of useful notations 



V increases at most at quadratic rate 0{\9\^) when \9\ — >■ oo. 
Assumption A[3f) is trivially satisfied when £ is finite. 

We denote by J :— (ll'^ /N) ® la the projector onto the 
consensus subspace {l 9 : 9 € R''} and by := IdN — J 
the projector onto the orthogonal subspace. For any vector 
G R''^, remark that = 1 ® (0) + where 



is a vector of R'' equal to {9i + ■ 
9 = {9'[, . . .,9lf)'^ for some 9i, . 



(7) 



■ + 9m)/N in case we write 
. ,6ljv in R'*. 



Assumption 4. There exists a constant C2, such that for 
any0 = {9'[,--- ,9%f in R''^, 



Ee [ 1^1'] < C2 ( 1 + v{{e)) + \j^ef 



1 

V/((0))-^Ev/,(&. 



(8) 

< C2\J^0\ (9) 



Condition © implies that \Vf{9)\'^ < C2(l + V{9)). This 
means that the mean field \/f{9) cannot increase more rapidly 
than O(l^l) as \9\ — 00. Condition ((9| is in particular sat- 
isfied in case all /i's are Lipschitz function. Condition ((9| 
ensures that small variations of vector near the consensus 
space cannot produce large variations of \/fi{9i). 



Assumption A[3}3) means that F is a Lyapunov function for 
— V/. In case of gradient systems obtained from optimiza- 
tion problems such as ((T} , a Lyapunov function V is usually 
given by the objective function / itself, or by a composition 
^ o / of / with a well-chosen increasing map 0: Assump- 
tion A[3b) is then trivially satisfied. In this case, the set L 
reduces to the roots of V/: 

L^{9eR'^ : Vf{9) = 0} . 

Assumption A[3] combined with the condition J]]^ 7„ = -l-oo 
allows to prove the convergence of the deterministic sequence 
tn+i ~ tn — 7n+iV/(fn) to the Set L. Wheu V/ is unknown 
and replaced by a stochastic approximation, the limiting be- 
havior of the noisy algorithm is similar under some regularity 
conditions and under the assumption that the step-size se- 
quence satisfies 7^ < 00. Assumption A[3l::) implies that 



3.2 Convergence w.p.l 

The disagreement between agents can be quantified through 
the norm of the vector 

J^0„ = 0„-l(g){6»„) . 



Lemma 1 (Agreement), [/nrfer A[T][2j A[3ji-cJ anrf A|4j 

i) J^0n converges to zero almost surely (a.s.) as n — >■ 00. 

ii) For any /3 < 2a, lim,woo n^E [\J^0n\^] = . 

Lemma [T] is the key result to characterize the asymptotic 
behavior of the algorithm. The proof is omitted due to lack 
of space, but will be presented in an extended version of this 
paper. Point i) means that the disagremeent between agents 



converges almost-surely to zero. Point ii) states that the 
convergence also holds in and that the convergence speed 
is faster than 1/y/n: This point will be revealed especially 
useful in Section [Ol Define 6(9, A) ~ inf{|6l - ip\ : ip € A} 
for any 9 GW^ and Ac R''. Define := {l<Si9 : 9 € £}. 



Theorem 1. Assume A[TJ A[2l Ad and Ag) Then, w.p.l, 
lim d{e„, 1 ® £) = . 

Moreover, w.p.l, {{0„))n>i converges to a connected com- 
ponent of L. 

Theorem [T] states that, almost surely, the vector of iterates 
On converges to the consensus space as n — >■ cxj. Moreover, 
the average iterate of the network converges to some 

connected component of £. When L is finite. Theorem [T] 
implies that On converges a.s. to some point in 1® L. 

The proof of Theorem [T] is omitted. Conditions A[2] A[3^-e) 

and A|3]imply that, almost-surely, (a) the sequence ((0„))n>i 
remains in a neighborhood of L thus implying that the se- 
quence remains in a compact set of and (h) the sequence 
iy [{On)))n>i converges to a connected component of V[L). 
Finally, A[3f) implies the convergence of {{0n))n>i to a con- 
nected component of L. 

3.3 Central Limit Theorem 

Let 9, be a point satisfying the following Assumption. 

Assumption 5. a) 9, £ L. 
h) Function f is two times differentiable at point 9, and 
f{9) = H{9,)[9 - 9:,) + 0{\9 - 9:,\^) for any 9 in a neighbor- 
hood of 9.t, where H{9,) denotes the dx d Hessian matrix 
of f at point 6* . 

c) H{9t) is a stable matrix: the largest real part of its eigen- 
values is —L, where L > 0. 

d) There exists S > such that the function: n> Eg 
is bounded in a neighborood of 1® 9,. 

e) The matrix-valued function Q : R'*^ R''^'' defined by: 

Q{0) = Ee [ {{Y} - Eg{Y)) {{¥) - ]Kg{Y)f ] 

is continuous at point 1 CS) ^* . 

f) Matrix Q{1 iSi 9,) is positive definite. 

Assumption 6. For any n > 1, 7„ = 70 where ^ G 
(1/2, 1] and 70 > 0. In case ^ = 1, we furthermore assume 
that 2L70 > 1. 



The normalized disagreement vector J 0„ converges 

to zero in probability by Lemma [TJi). Therefore, it can be 
shown that the asymptotic analysis reduces to the study 
of the average To that end, we remark from A[Tk) 

that (1 (g) Id){Wn ® Id) = (1 ® Id). Thus, {On) satisfies: 
{On) ~ {On-i) + "/n{Yn). The maiu step is to rewrite the 
above equality under the form: 

{On) = {On-l) + In (- V/ ( (6>„_ 1 ) ) + (5M„ + r„) , 



where &Mn is a martingale increment sequence satisfying 
some desired properties (details are omitted) and where 
is a random sequence which is proved to be negligible. The 
final result is a consequence of [T2]. A sequence of r. v. (X„)„ 
is said to converge in distribution (stably) to a r.v. X given 
an event E whenever lim„ E (3(X„)li5) = E (g(X)) ?(£) for 
any bounded continuous function g. 

Theorem 2. Assume A[l]-[31 A[S] and assume that there 
exists a point 9, satisfying A[5l Then, given the event 

{ lim {On) =9,} , 
the following holds true: 

In^'^ (6»n - 1®^,) A 1® Z . 

where Z is a dx 1 zero mean Gaussian vector whose covari- 
ance matrix E is the unique solution to: 

{H{9,) + CId) S S {H{9,) + (Id) = -Q{1 (S 9,) (10) 

where C = G (1/2, 1) and C = l/(27o) i/C = 1- 

Theorem [2] states that, given the event that sequence On 
converges to a given point 1 (g) the normalized error 
Jn {On — 1 9t,) converges to a Gaussian vector. The 
latter limiting random vector belongs to the consensus sub- 
space i.e., it has the form 1 (g) Z, where Z is a Gaussian r.v. 
of dimension d. Theorem[2]has the following important con- 
sequences. First, thanks to the gossip step, the component 
of the error vector in the orthogonal consensus subspace is 
asymptotically negligible. The dominant source of error is 
due to the presence of observation noise in the algorithm, 
and not on possible disagreements between agents. As a 
matter of fact, the limiting behavior of the average estimate 
is similar to the one that would have been observed in a cen- 
tralized setting. Interestingly, this remark is true even if the 
agents reduce their cooperation as time increases (consider 
the case where Wn = Id with probability converging to one). 

3.4 Influence of the network topology 

To illustrate our claims, assume for simplicity that {Wn)n>i 
is an i.i.d. sequence. Then p„ =: p is a constant w.r.t. n. In 
this case, all our hypotheses on sequence {Wn)n>i reduce to: 

p<l. (11) 

In order to have more insights, it is useful to relate the 
above inequality to a connectivity condition on the net- 
work. To that end, we focus on an example. Assume for 
instance that matrices Wn follow the now widespread asyn- 
chronous random pairwaise gossip model described in [6] . At 
a given time instant n, a node i, picked at random, wakes 
up and exchange information with an other node j also cho- 
sen at random (other nodes k ^ do not participate to 
any exchange of information). Wn belongs to the alphabet 
{Wi,j : i,j = 1, . . . ,N} where: 

Wi,j := Id - {ci - ej){ei - 6^)^/2 , 

where Ci represents the ith vector of the canonical basis 
{ei{k) — 1 if i = k, zero otherwise). Denote by Pij = Pj^i 
the probability that the active pair of nodes at instant n 
coincides with the pair {i,j}. In practice, Pij is nonzero 



only if nodes i,j are able to communicate (i.e. they are 
connected). Consider the weighted nondirected graph S = 
(£, V, W) where £ is the set of vertices {1, . . . , A''}, V is the 
set of edges (by definition, i is connected to j iff Pij > 0), 
and W associates the weight Pij to the connected pair {i, j}. 
Using [6], it is straightforward to show that condition (|lf |l 
is equivalent to the condition that S is connected. 

Corollary f. Replace conditions (Jj) and {3]] with the 
assumption that S «s connected. Then Theorems Q] and [J| 
still hold true. 

In particular, the (nonzero) spectral gap of the Laplacian 
of S has no impact on the asymptotic behavior of sequence 
9„. Stated differently, the dominant source of error in the 
asymptotic regime is due to the observation noise. The dis- 
agreement between agents is negligible even in networks with 
a low level of connectivity. 

4. CONSTRAINED OPTIMIZATION 
4.1 Framework and Assumptions 

We now study the case where the set G is determined by a 
set of p inequality constraints (p > 1): 

G~[9eR'' : Vj = 1, . . . ,p, qj{e) < O} (12) 

for some functions qi, . . . ,qp which satisfy the following con- 
ditions. Denote by dG the boundary of G. For any 9 £ G, 
we denote by ^4(6^) C {1, . . . ,p} the set of active constraints 
i.e., qj{d) = if j G A{e) and qj{e) < otherwise. 

Assumption 7. a) The set G defined by f71)j is compact, 
h) For any j = 1, . . . ,p, qj : US'* — >■ R is a convex function 
c) For any j = 1, . . . ,p, qj is two times continuously differ- 
entiable in a neigborhood of dG. 

c) For any 6 £ dG, {Vqj{9) : j G A{9)} is a linearly inde- 
pendent collection of vectors. 



In the particular case where all utility functions f\, . . . , fN 
are assumed convex, it is possible to study the convergence 
w.p.l of the algorithm (|3]) following an approach similar 
to [7], and to prove under some conditions that consensus 
is achieved at a global minimum of the aggregate objective 
function /. Nevertheless, utility functions may not be con- 
vex in a large number of situations, and there seems to be few 
hope to generalize the proof of [7] in such a wider setting. In 
this paper, we do not assume that the utility functions are 
convex. In this situation, convergence to a global minimum 
of ([T]) is no longer guaranteed. We nevertheless prove the 
convergence of the algorithm Q to the set of Kuhn- Tucker 
(KT) points Lkt: 

LKT~{e£G : -v/(e) eNG(e)} , 

where J^cid) is the normal cone to G i.e., 7^g{9) :~ {v € 
R'* : V6»' e G,v'^{6-e') > 0}. To prove convergence, we 
need one more hypothesis: 

Assumption 8. The following two conditions hold: 

a) supggQN Ee[|y|^] < oo. 

b ) Inequality 0) holds for any 6 £ G^ . 



4.2 Convergence w.p.l 

Theorem [3] below establishes two points: First, a consen- 
sus is achieved as n tends to infinity, meaning that J^On 
converges a.s. to zero. Second, the average estimate {0„) 
converges to the set of KT points. 

Theorem 3. Assume J^and Then, w.p.l, 

lim d(0„, KSLkt) = . 

Moreover, w.p.l, ({0n))n>i converges to a connected com- 
ponent of Lkt. 

As a consequence, if Lkt contains only isolated points, se- 
quence {On) converges almost surely to one of these points. 
The complete proof of Theorem [3] is omitted. We however 
provide some elements of the proof in the next paragraph. 

4.3 Sketch of the proof 

To simplify the presentation, we shall focus on the case p = 1 
i.e., there is only one inequality constraint. We put q := gi 
and define e := Vg/|Vg| the normalized gradient of function 
q (e is well defined in a neighborhood of dG by ATTbll. 

Step 1: Agreement is achieved as n —> oo. 
Similarly to the unconstrained optimization case (recall pre- 
vious Lemma[l]), the first step of the proof of Theorem[3]is to 
establish that j J^0„j converges a.s. to zero. As a noticeable 
difference with the unconstrained case, here stability issues 
do not come into play as G is bounded (for this reason, 
the proof of agreement is simpler than in the unconstrained 
case) . 

Step 2: Expression of the average (On) in a R-M like form. 
Using {1® Id){Wn<E) Id) = (l^Id), it is convenient to write 

{0n) = (On-l) + -ynZn whcrC 
1 

Z„ := r^y^ PG{6n-l,i + 'JnYn^i) — On-l^i . 

Consider the martingale increment sequence A„ := Zn — 
E(Z„|9^„_i). From Assumption A[8^), it can be shown that 
sup,j E[ 1 A„p] < oo. Now note that for any 6 e G, y e R'*, 

lim7-^ {Paie + -yy) -9) = y~(y'^e{e))+e{e)laG{0) , (13) 

74.0 

where (x)'^ :— max(a;,0) and where Iga is the indicator 
function of dG. Using (fT3)l along with A[7j:) and A^Sjp) and 
the fact that | J^0„ | converges to zero, we obtain after some 
algebra: 

(On) = (On^l) + 7n/t(6»„_l) + JnAn J„Un (14) 

where u„ is some sequence which converges to zero a.s. and 
where we defined for any G G^ , 

h{e) := -V/((0)) - ^ [iY,^eie,))+] leaie.) ■ 

Step 3: From equality to inclusion. 

Equality (|14|l is still far from a conventional R-M equation. 

Indeed, the second term of the righthand side 7n/i(0 

not a function of tlie average (On-i) as it depends on the 



whole vector On. Of course, since the agreement is achieved 
for large n, 6n-i should be close to 1 ® {6n-i). If h were 
continuous, one could thus write h{On-i) — h{1 ® 
solving this way the latter issue. This is unfortunately not 
the case, due to the presence of indicator functions in the 
definition of h. We must resort to inclusions. For any e > 
and any 9 £ G, define the following subset of R'': 

F,{e):={-Vf{e)-xe{e)l,i^e,aG)<. ■■ x€[0,M]} 

where M < oo is a fixed constant chosen as large as needed, 
J is equal to one if 9 is at distance less 



and where 1 



d(s,eG)< 

than e of the boundary, and to zero otherwise. In particular, 
ld(s,eG)<<! = leG(^) for e = 0. It is straightforward to show 
that: 

veeG^, h{e) e F^j^g^m) 

provided that M is chosen large enough. Finally, equal- 
ity (|14p can be interpreted in terms of the following inclu- 



(Bn) G {6»n-i) +7„F,„({6»„_i)) +7„A„ +7„'u„ (15) 
where we defined for simplicity e„ := \J^0n-i\- 

Step 4: Interpolated process and differential inclusions. 
From this point to the end of the proof, we shall now study 
one fixed trajectory {{0„{ijj)})„ of the random process (On), 
where uj belongs to an event of probability one such that 
e„{ijj) — >■ 0, Un{uj) — >■ as n tends to infinity, and sequence 
{A„{lo))„ satisfies some asymptotic rate of change condition 
(see [111 110) for details). Dependencies in oj are however 
omitted for simplicity. Motivated by the approach of [TD] , we 
consider the following continuous-time interpolated process. 
Define t„ = Y^'^^-^ fk and 

e(t) := + - _ , <t<r,,. 

'^n '^n — 1 

The next step is to prove that is a perturbed solution to 
the differential inclusion: 



dx{t) 
dt 



(16) 



When we write that s is a solution to (|16|) . we mean that 
X is an absolutely continuous mapping x : R — >■ R'' such 
that H16|) is satisfied for almost all i £ R. A function B is 
a perturbed solution to {TSJ if it 'shadows' the behavior of 
a solution to (|16p as t — >■ oo in a sense made clear in [10| . 
In order to prove that is a perturbed solution to (|16p . 
the materials are close to those of [10] (see Proposition 1.3) 
with some care, however, about the fact that the mean field 
Fe^ is nonhomogeneous in our context (it depends on time 
n). The proof is concluded by straightforward application 
of [TD]. Consider the differential inclusion H16|) : function 
/ is a Lyapunov function for the set of KT points Ckt. 
Therefore, by [TD], the limit set 



|^e([t,+(^)) 



is included in Ckt. This concludes the proof. 



5. APPLICATION: POWER ALLOCATION 
5.1 Framework 

The context of power allocation for wireless networks has 
recently raised a great deal of attention in the field of dis- 
tributed optimization, cooperative and noncooperative game 
theory (see [13] and references therein). We consider an 
ad hoc network composed of A'' transmit-destination pairs. 
Each agent /user sends digital data to its receiver through K 
parallel (sub)channels. The channel gain of the ith user at 
the fcth subchannel is represented by a positive coefficient 
A''*''" which can be interpreted as the square modulus of the 
corresponding complex valued channel gain. As all agents 
share the same spectral band, user i suffers from the mul- 
tiuser interference produced by other users j ^ i. Denote by 
pi;fe ^ g ^j^g power allocated by user i to the qth subchannel. 
We assume that X^fLiP*''' — "^i where J"; is the maximum 
allowed power for user i. Define = [p''^, • • • ,P*'^]"^ and 
9 = [p^'^ , ■ ■ ■ ,p^ vector of all powers of all users of 

size d :— KN. Assuming deterministic channels, user i is 
able to provide its destination with rate given by (see e.g. 

m) 



R,{9,A') :=^log 1 + 



where A-''^''^ is the (positive) channel gain between trans- 
mitter j and the destination of the ith transmit-destination 
pair, and where = [^^•'■\ • • • , A^'^^^]^. Here, is the 
variance of the additive white Gaussian noise at the desti- 
nation of source i. The aim is to select a relevant value for 
the resource allocation parameter 6^ £ G in a distributed 
fashion, where G is the set of constraints obtained from the 
aforementioned power constraints V-i, . . . ,7 m and positivity 
constraints. 



5.2 Deterministic Coalitional Allocation 

To simplify the presentation, we first consider the case of 
fixed deterministic channel gains A^ , . . . , A^ . A widespread 
approach consists in computing 9 through the so-called best 
response dynamics. At every step of the iteration, an agent i 
updates its own power vector assuming other users' power 
to be fixed. This is the well known iterative water filling al- 
gorithm [13] . Here, we are interested in a different perspec- 
tive. The aim is rather to search for social fairness between 
users. We aim at finding a local maximum of the following 
weighted sum rate: 



Y^15.R.{9,A') 



(17) 



where pi is an arbitrary positive deterministic weight known 
only by agent i. Consider the following deterministic gra- 
dient algorithm. Each user i has an estimate 9n^i of 9 at 
the nth iteration. Here, we stress the fact that a given 
user has not only an estimate of what should be its own 
power allocation p*, but has also an estimate of what should 
be the power allocation of other users j 7^ i. Denote by 
0„ = • • ■ , Cjv]^ the vector of size dN = KN^ which 

gathers all local estimates. Similarly to ((3}, a distributed 
algorithm for the maximization of (|17|) would have the form 
0„ = {Wn®Id)PG« [©n-i +7i^(fn-i;A)] wherey(6>;A) = 
[l3iVgRi{9i;A^f,--- , Pn^ eRN{9N\ A'^fY' and where Ve 



is the gradient operator with respect to the first argument 
e of R,{e,A*). 

5.3 Stochastic Coalitional Allocation 

In many situations however, the above algorithm is imprac- 
tical. This is for instance the case when the channel gains 
are random and rapidly time-varying in an ergodic fashion. 
This is also the case when channel gains are known only up 
to a random perturbation. In such settings, it more likely 
that each user i observes a random sequence (^^)n>i, where 
A\, . . . , typically correspond to the realization at time 
n of a time-varying ergodic channel. The distributed opti- 
mization scheme is given by equation ([Sjl where 

Assume for the sake of simplicity that sequence {A\, . . . , A^ ) 
is i.i.d. By Theorem [S] all users converge to a consensus on 
the global resource allocation parameters. After convergence 
of the distributed R-M algorithm, the resource allocation pa- 
rameters achieve a Kuhn- Tucker point of the optimization 
problem: 

JV 

max^AE[i?,(6',A')] (18) 

^ i = l 

where the expectation in the inner sum is taken w.r.t. the 
channel coefflcients . 

We provide some numerical results. Consider four nodes: 1 
is connected to 2 (1 ~ 2), 1 ~ 3, 2 ~ 3, 2 ~ 4, 3 ~ 4. Assume 
Q = 2, /3i = /?3 = 0.3, /32 = /34 = 0.2, al = al = 0.1, 
ol = 0.05, a| = 0.02. Assume that all r.v. A)i>'''' are i.i.d. 
with standard exponential distribution. The network model 
is chosen as in Section 13.41 The algorithm is initialized at 
a random point Oq. Figure [T] illustrates the fact that the 
disagreement between agents | J^0„| converges to zero as n 
tends to infinity. Figure [5] shows the estimated value of the 
objective function given by (|18fl . The expectation in (|18|l is 
estimated using 10'' Monte-Carlo trials at each iteration. 
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Figure 1: |J 0„| as a function of n. 
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Figure 2: Estimated value of the objective function. 
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